YouTube API Design Decisions
Learn about the technical considerations that direct the design of the YouTube API.
The API design for a streaming service is an intricate operation due to the complex nature of the system. It involves significant technical aspects—for example, the API architecture style to use between different interacting entities and the protocols adopted for transferring streaming data. In the following section, we’ll decide on the primary design considerations that we’ll stick to in designing an API for the YouTube streaming service.
Design overview#
The following illustrations show a bird's eye view of YouTube's primary services, which consist of streaming, uploading, searching, commenting, and rating (liking or disliking) services. The upload service is used to upload the video contents to the blob storage and relevant metadata to the metadata database. The search service efficiently finds relevant videos from the vast database of videos. Similarly, the comment service enables users to post comments on a video, and these can be rated via the rating service.
Since we have covered the other services in our foundational design problems, we’ll focus on streaming services, and other services relevant to streaming, in the figure below. All streaming requests coming from the clients are passed via the API gateway and directed to the relevant service, which, in turn, retrieves the relevant data from the persistent layer. For example, the ad service is responsible for handling any requests related to embedding ads in videos. Typical responsibilities of this service include communicating with other services to find optimal ads to serve specific users, choosing the number of ads to be served, and serving ads during playback.
The following table describes some of the essential components that are involved in the design of a streaming system.
Components and Services Details
Component or Service | Details |
Streaming service |
|
Advertisements service |
|
Encoding service |
|
User data service |
|
API gateway |
|
Databases |
|
Blob storage |
|
Point to Ponder
Question
Is the manifest file generated statically during video processing or dynamically depending on the requests of individual users?
Manifest files can be generated statically or dynamically, depending upon the requirements and choice of video streaming services. It also depends on the complexity of the streamed content.
Static (one-time) manifest files are pre-generated and delivered to clients as soon as the streaming session begins. This means that the same manifest file is shared with all the users requesting the stream.
In contrast, dynamic manifest files are generated on-the-fly as soon as the request is placed on the server. Each generated manifest file takes into consideration the client’s location, device specifications, network conditions, preferences, etc., to generate a tailored response.
Although dynamic manifest files produce a better user experience and are suitable for dynamic ad insertion, their generation is resource-intensive and complex as compared to the static approach.
Workflow#
The video streaming process consists of the following two steps:
Publishing
Streaming
Publishing#
The video publishing process begins by first capturing a video and uploading it to the back-end storage. During this process, complex operations are performed over raw data, such as the encoding, compression, and segmentation of videos and their associated audio files. Moreover, the segments are stored in multiple formats to provide flexibility and an uninterrupted streaming experience to end users. These video segments are later sent to edge servers, like CDN, for the viral and most-watched videos, providing a better user experience.
Streaming#
When a user requests a video, the streaming server sends a manifest file to the client before the video. Using the manifest file, the client fetches video segments with different bitrates based on the network condition and type of the device. The manifest file also includes advertisement information that is played at different intervals during a video via the advertisement service.
A single video consists of many segments; therefore, the client sends multiple requests for each video segment. The segments are generated with different bitrates in the publishing stage; therefore, the streaming service (CDN, in the illustration below) supports different devices and network conditions. On the client side, the video is decoded, decompressed, and played in a video player.
Note that we use CDN in the illustration below to show that many clients requesting the same stream can be served simultaneously via CDN.
At this point, we are now aware of how YouTube provides streaming by integrating various services within its architecture. In the following section, we decide on some important design issues that direct the design of the YouTube API in the next lesson.
Point to Ponder
Question
What happens if a user pauses a video, goes away for an extended time, comes back, and plays it again?
Since streaming is accomplished through a reliable transmission control protocol (TCP) connection, it will be terminated after the user goes away for a while. When a user resumes playback, only the buffered media will be played, after which the client playback device will request to establish a new TCP connection and request the next chunk/segment in the sequence.
Design considerations#
In the YouTube design, we have three entities interacting with each other: the client, the API gateway, and the back-end services. Let's decide on the API architecture styles between them. Next, we describe the data formats and the HTTP versions suitable to adopt in the design of the YouTube API.
Architecture styles#
Considering the different architectural styles, let's expand the interaction between the client, the API gateway, and the back-end services.
Client to API gateway: In streaming, the primary task is to retrieve a video; therefore, we only perform the read operation on a resource, which is, in this case, a video. Since retrieving a video is a subset of the CRUD operations, this naturally fits the REST API architecture style. So, here, we acquire the REST style without embedding any additional complexities amid the interacting entities.
API gateway to back-end servers: The client requests, such as streaming, advertisement, or any other, are dynamically routed by the API gateway to the respective servers.
Each video has an associated group of attributes to be retrieved along with it. However, depending on the query parameters in the request, some attributes require filtering. Filtering helps avoid unnecessary fetching, parsing, and data storage to make the process time-efficient. At first glance, we’d need GraphQL to perform such data fetching from different services. However, the filtering operation, in this case, is performed by the individual services and not GraphQL. Therefore, an additional layer of GraphQL between the API gateway and back-end servers could bring additional complexity and affect the performance.
Hence, keeping the CRUD operations on videos and filtering attributes in view, we employ the REST architecture style between the API gateway and back-end services.
HTTP version#
Although support for HTTP streaming was added in HTTP/1.1, it is relatively slower than HTTP/2.0. This is because HTTP/2.0 works on binary data, which is compressed during transmission and, in turn, reduces the file size to be sent. Similarly, HTTP/2.0 provides multiplexing, which turns out to be a better choice for sending multiple streams of content (audio/video segments, subtitles, etc.) over a single TCP connection without head of line (HOL) blocking (that is present in HTTP/1.1). Moreover, HTTP/2.0 is also resilient to network failures. Therefore, we acquired HTTP/2.0 in the design of our API. While HTTP/3.0 can be considered a good option, it is yet to be widely adopted to be compatible with different devices over the Internet infrastructure.
Note: YouTube uses the QUIC protocol to retrieve video and audio quickly in different streams. Infact, QUIC was developed by Google, and YouTube was among the early adopters. It is supported in all YouTube mobile applications across different platforms.
Point to Ponder
Question
Can we use HTTP/1.1 for streaming a video?
Yes, we can use HTTP/1.1 for streaming by setting the Keep-Alive header to true. Via this header, the server will keep the connection open until the video finishes or the client informs the server to terminate the connection. However, this will cause extra delays in communication due to the lack of multiplexing (that might be needed to fetch more than one segment in parallel to improve a user’s experience) and compression.
Data formats#
Since the audio and video data require efficient communication between the client and API gateway, it is imperative to use the binary format for sending such data. Our choice of HTTP/2.0 further helps the cause since it compresses the data before transmission.
However, the selection of the data format for metadata should be client-dependent. This is because some clients, such as mobile applications, do not offer web developer tools. In that case, it is optimal to use binary formats for efficiency. For clients such as browsers, the JSON format is a good option for metadata because human readability and configuration make debugging easier.
Note: The video and audio are encoded in byte streams compressed with a supported compression algorithm—for example, gzip, deflate, or br.
Summary#
In this lesson, we discussed the workflow of a streaming service. Next, we decided on architecture styles between the interacting entities: REST for the client to API gateway and the API gateway to back-end services. Furthermore, we chose HTTP/2.0 as the application layer protocol due to its advantages over other HTTP versions for streaming services. We also adopted the binary format for sending or receiving audio and video data and the JSON format for the metadata.
Design Considerations | Client to API Gateway | API Gateway to Back-end Services |
API architecture style | REST | REST |
HTTP version | HTTP/2.0 | HTTP/2.0 |
Data format | Binary for media data while JSON for metadata | Binary for media data while JSON for metadata |
Introduction to Video Streaming
API Model for YouTube Service